Auxiliary variables in conditional Gaussian mixtures for automatic speech recognition
نویسندگان
چکیده
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in others. Since only single conditional Gaussians were used in that work, we extend that work here to using conditional Gaussian mixtures in the emission distributions to make this work more comparable to state-of-the-art automatic speech recognition. We also introduce a rate-of-speech (ROS) variable within the conditional Gaussian mixtures. We find that, under the current methods, using observed pitch or ROS in the recognition phase does not provide improvement. However, systems trained on pitch or ROS may provide improvement in the recognition phase over the baseline when the pitch or ROS is marginalized out.
منابع مشابه
Auxiliary Variables in Conditional Gaus Speech Recogni
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in others. Since only single conditional Gaussians were used in that work, we extend that work here to using conditional Gaussian mixtures in t...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملUsing Gaussian Mixtures for Hindi Speech Recognition System
The goal of automatic speech recognition (ASR) system is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. In general the speech signal is captured and pre-processed at front-end for feature extraction and evaluated at back-end using the Gaussian mixture hidden Markov model. In this statistical approach since the eva...
متن کاملMixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition
In standard automatic speech recognition (ASR), hidden Markov models (HMMs) calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned only upon the hidden state variable. Recent work [12] showed the benefit of conditioning the emission distributions also upon a discrete auxiliary variable, which is observed in training and hidden in reco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002